feature trajectory
Learning Time in Static Classifiers
Ding, Xi, Wang, Lei, Koniusz, Piotr, Gao, Yongsheng
Real-world visual data rarely presents as isolated, static instances. Instead, it often evolves gradually over time through variations in pose, lighting, object state, or scene context. However, conventional classifiers are typically trained under the assumption of temporal independence, limiting their ability to capture such dynamics. We propose a simple yet effective framework that equips standard feedforward classifiers with temporal reasoning, all without modifying model architectures or introducing recurrent modules. At the heart of our approach is a novel Support-Exemplar-Query (SEQ) learning paradigm, which structures training data into temporally coherent trajectories. These trajectories enable the model to learn class-specific temporal prototypes and align prediction sequences via a differentiable soft-DTW loss. A multi-term objective further promotes semantic consistency and temporal smoothness. By interpreting input sequences as evolving feature trajectories, our method introduces a strong temporal inductive bias through loss design alone. This proves highly effective in both static and temporal tasks: it enhances performance on fine-grained and ultra-fine-grained image classification, and delivers precise, temporally consistent predictions in video anomaly detection.
- Oceania > Australia > New South Wales (0.04)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- Asia > Mongolia (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining > Anomaly Detection (0.74)
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Liu, Jiacheng, Zou, Chang, Lyu, Yuanhuiyi, Chen, Junjie, Zhang, Linfeng
Diffusion Transformers (DiT) have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. To solve this problem, feature caching has been proposed to accelerate diffusion models by caching the features in the previous timesteps and then reusing them in the following timesteps. However, at timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, leading to a pronounced increase in errors introduced by feature caching, significantly harming the generation quality. To solve this problem, we propose TaylorSeer, which firstly shows that features of diffusion models at future timesteps can be predicted based on their values at previous timesteps. Based on the fact that features change slowly and continuously across timesteps, TaylorSeer employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. Extensive experiments demonstrate its significant effectiveness in both image and video synthesis, especially in high acceleration ratios. For instance, it achieves an almost lossless acceleration of 4.99$\times$ on FLUX and 5.00$\times$ on HunyuanVideo without additional training. On DiT, it achieves $3.41$ lower FID compared with previous SOTA at $4.53$$\times$ acceleration. %Our code is provided in the supplementary materials and will be made publicly available on GitHub. Our codes have been released in Github:https://github.com/Shenyi-Z/TaylorSeer
Asynchronous Event-Inertial Odometry using a Unified Gaussian Process Regression Framework
Li, Xudong, Wang, Zhixiang, Liu, Zihao, Zhang, Yizhai, Zhang, Fan, Yao, Xiuming, Huang, Panfeng
Recent works have combined monocular event camera and inertial measurement unit to estimate the $SE(3)$ trajectory. However, the asynchronicity of event cameras brings a great challenge to conventional fusion algorithms. In this paper, we present an asynchronous event-inertial odometry under a unified Gaussian Process (GP) regression framework to naturally fuse asynchronous data associations and inertial measurements. A GP latent variable model is leveraged to build data-driven motion prior and acquire the analytical integration capacity. Then, asynchronous event-based feature associations and integral pseudo measurements are tightly coupled using the same GP framework. Subsequently, this fusion estimation problem is solved by underlying factor graph in a sliding-window manner. With consideration of sparsity, those historical states are marginalized orderly. A twin system is also designed for comparison, where the traditional inertial preintegration scheme is embedded in the GP-based framework to replace the GP latent variable model. Evaluations on public event-inertial datasets demonstrate the validity of both systems. Comparison experiments show competitive precision compared to the state-of-the-art synchronous scheme.
AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression
Wang, Zhixiang, Li, Xudong, Zhang, Yizhai, Zhang, Fan, Panfeng, null
Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There are many methods for producing such estimations, but most boil down to a synchronous discrete-time fusion problem. However, the asynchronous nature of event cameras and their unique fusion mechanism with inertial sensors remain underexplored. In this paper, we introduce a monocular event-inertial odometry method called AsynEIO, designed to fuse asynchronous event and inertial data within a unified Gaussian Process (GP) regression framework. Our approach incorporates an event-driven frontend that tracks feature trajectories directly from raw event streams at a high temporal resolution. These tracked feature trajectories, along with various inertial factors, are integrated into the same GP regression framework to enable asynchronous fusion. With deriving analytical residual Jacobians and noise models, our method constructs a factor graph that is iteratively optimized and pruned using a sliding-window optimizer. Comparative assessments highlight the performance of different inertial fusion strategies, suggesting optimal choices for varying conditions. Experimental results on both public datasets and our own event-inertial sequences indicate that AsynEIO outperforms existing methods, especially in high-speed and low-illumination scenarios.
Efficient Continuous-Time Ego-Motion Estimation for Asynchronous Event-based Data Associations
Wang, Zhixiang, Li, Xudong, Liu, Tianle, Zhang, Yizhai, Huang, Panfeng
Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes. The high temporal resolution and asynchronicity of event cameras offer great potential for estimating the robot motion state. Recent works have adopted the continuous-time ego-motion estimation methods to exploit the inherent nature of event cameras. However, most of the adopted methods have poor real-time performance. To alleviate it, a lightweight Gaussian Process (GP)-based estimation framework is proposed to efficiently estimate motion trajectory from asynchronous event-driven data associations. Concretely, an asynchronous front-end pipeline is designed to adapt event-driven feature trackers and generate feature trajectories from event streams; a parallel dynamic sliding-window back-end is presented within the framework of sparse GP regression on SE(3). Notably, a specially designed state marginalization strategy is employed to ensure the consistency and sparsity of this GP regression. Experiments conducted on synthetic and real-world datasets demonstrate that the proposed method achieves competitive precision and superior robustness compared to the state-of-the-art. Furthermore, the evaluations on three 60 s trajectories show that the proposal outperforms the ISAM2-based method in terms of computational efficiency by 2.64, 4.22, and 11.70 times, respectively.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Understanding Self-attention Mechanism via Dynamical System Perspective
Huang, Zhongzhan, Liang, Mingfu, Qin, Jinghui, Zhong, Shanshan, Lin, Liang
The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, current explanations of this mechanism are mainly based on intuitions and experiences, while there still lacks direct modeling for how the SAM helps performance. To mitigate this issue, in this paper, based on the dynamical system perspective of the residual neural network, we first show that the intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN). Thus the ability of NN to measure SP at the feature level is necessary to obtain high performance and is an important factor in the difficulty of training NN. Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. This novel perspective can also explain the lottery ticket hypothesis in SAM, design new quantitative metrics of representational ability, and inspire a new theoretic-inspired approach, StepNet. Extensive experiments on several popular benchmarks demonstrate that StepNet can extract fine-grained stiffness information and measure SP accurately, leading to significant improvements in various visual tasks.
- Asia > China (0.04)
- North America > United States > New York (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments
Lerato, Lerato, Niesler, Thomas
Dynamic time warping (DTW) can be used to compute the similarity between two sequences of generally differing length. We propose a modification to DTW that performs individual and independent pairwise alignment of feature trajectories. The modified technique, termed feature trajectory dynamic time warping (FTDTW), is applied as a similarity measure in the agglomerative hierarchical clustering of speech segments. Experiments using MFCC and PLP parametrisations extracted from TIMIT and from the Spoken Arabic Digit Dataset (SADD) show consistent and statistically significant improvements in the quality of the resulting clusters in terms of F-measure and normalised mutual information (NMI).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York (0.04)
- Africa > South Africa (0.04)